A crash course

Why should we use computer science tools in science?

Replicability Crisis

  • Failures to replicate (e.g. Ebersole et al., 2016; Open Science Collaboration, 2015; Wagenmakers et al., 2016).
  • Fraud (Bhattacharjee, 2013).
  • Improbable findings have been published in top-tier journals (e.g. Bem, 2011).



Proposed Solutions

  • Change the incentive structure (e.g., Nosek et al., 2015; Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012).
  • Be transparent and open (e.g. Rouder, 2016; Wicherts, Bakker, & Molenaar, 2011).
  • Change the statistical approach (e.g. Benjamin et al., 2018; Erdfelder, 2010; Rouder et al., 2016)



Proposed Solutions

We assume people do stuff on purpose.





Things People Don’t Do on Purpose

Mistakes

  • Errors when programming an experiment or study (e.g. randomization).
  • Equipment failure (e.g. responses are collected unreliably).
  • Lost data.
  • Errors when coding the analysis (e.g. with data cleaning).
  • Errors when reporting the analysis (e.g. typos).


Consequences

  • Prevalence: Roughly half the publications in 30 years of literature contained at least one malformed statement of a statistical test (Nuijten, Hartgerink, Assen, Epskamp, & Wicherts, 2016).
  • Bias: Simple mistakes tend to go in scientists’ preferred direction (Gould, 1996).
  • Persistence: Once in the literature mistakes are almost impossible to detect (Rouder, Haaf, & Snyder, 2019).

Coding helps

  • Coding your analysis instead of “clicking” it leaves a trail.
  • Working with others on code in team science (check each others’ work!).
  • Share your code with others.
  • Version control can help!

Version control

  • Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.
  • History of all changes (who, what, when).
  • Helps to avoid mistakes (working on the wrong version, deleting, …).
  • Merging changes of multiple collaborators in one file.

Do I Want to Use git or github?

Do I Want to Use git or github?

Do I Want to Use git or github?

Do I Want to Use git or github?


Do I Want to Use git or github?

github

  • One platform that uses git as version control system.
  • Provides free and easy storage of repositories (projects).
  • Widely used.

git

  • A very popular version control system.
  • The status quo in IT.
  • A mature and developed system - not really for beginners!
  • Learning to navigate github is for now, learning gitis for life!

What I would like to show you about git

  • How to use a terminal
  • Git
    • What is it good for?
    • What is it?
    • What can it do?
  • Set-up for your computer
    • GUI/terminal
    • R Studio & git
    • SSH
    • Set name & email address
  • Your first repo
    • Github and GitLab
    • In R Studio
    • gitignore
    • README
  • Workflow
    • Add, Commit, Push
    • Diff
    • Merge, Branches, Tagging… (all the cool stuff)
    • What happens if something goes wrong? (And it will.)

What we have time for

  • How to use a terminal
  • Git
    • What is it good for?
    • What is it?
    • What can it do?
  • Set-up for your computer
    • GUI/terminal
    • R Studio & git
    • SSH
    • Set name & email address
  • Your first repo
    • Github and GitLab
    • In R Studio
    • gitignore
    • README
  • Workflow
    • Add, Commit, Push
    • Diff
    • Merge, Branches, Tagging… (all the cool stuff)
    • What happens if something goes wrong? (And it will.)

What is it?

What can it do?

  • A lot! Which is why I can only mention part of its functionality here.
  • Working on one product in (large) teams.
  • Working on things that can break.
  • git can only integrate and show changes in text files.
  • binary files (images, pdf, etc.) can be tracked and uploaded but changes cannot be shown.

Setup for your computer

Using git

  • Git does not have a user interface.
  • You can either use the terminal, or install an additional interface.
  • Github has its own GUI. Some people like it.
  • We will use Rstudio as user interface.

R Studio & git

R Studio & git

Tools ➤ Global Options ➤ Git/SVN.

Make sure the first box is ticked and the “git.exe” (Windows) is included in the first box.

Set name & email address

  • Open the Terminal in R Studio.
  • Set an email address and user name for git.
git config --global user.email "myemail@email.com"
git config --global user.name "My commit name"


First Repository!

Github

In R Studio

File ➤ New Project ➤ Version Control ➤ Git

In R Studio

  • You will have to type in you user name and password for github.
  • Initializes a local git repository with an R project (opening a clean R Studio session when opening).
  • You can see the README file from github.
  • Adds a .gitignore file.


gitignore

  • Specifies intentionally untracked files to ignore.
  • Each line in a gitignore file specifies a pattern.
  • R Studio pre-specifies some useful patterns.
  • For R Markdown: Cache files! .tiff, .eps, .rdb, .rdx


README

README

  • Tell other people (and yourself in a year) why your project is useful, what they can do with your project, and how they can use it.
  • On github default README files are Markdown files.


Git Workflow

Do some work

Git Add


git add .gitignore myfirstrepo.Rproj
  • git does autocomplete for file names in the terminal!
  • Note that many user interfaces combine git add and git commit (next step).

Git Commit


git add .gitignore myfirstrepo.Rproj
git commit -am "My first commit"

Commits always have a commit message.

Commit message


Git Push

git add .gitignore myfirstrepo.Rproj
git commit -am "My first commit"
git push

Congrats! You have done it! Now local and remote repositories are up to date!

Git Pull

Before you start working on the project the next time:

git pull

Pull, work some more, repeat.

What changed since the last commit?

git diff

What happens if something goes wrong? (And it will.)

What happens if something goes wrong? (And it will.)

Summary

  • Add, commit, push, pull.
  • Use it!
  • git documentation and error tracking are great!

Your turn!

Thank you!

Additional info:

Vuorre, M., & Curley, J. P. (2018). Curating Research Assets: A Tutorial on the Git Version Control System. Advances in Methods and Practices in Psychological Science, 1(2), 219–236.

Further references:

Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100, 407–425. Retrieved from http://dx.doi.org/10.1037/a0021524

Benjamin, D. J., Berger, J., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Johnson, V. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6.

Bhattacharjee, Y. (2013). The mind of a con man. New York Times, April 26, 2013. Retrieved from http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html?pagewanted=all

Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., … Nosek, B. A. (2016). Many labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. Retrieved from http://ezid.cdlib.org/id/doi:10.17605/OSF.IO/QGJM5

Erdfelder, E. (2010). A note on statistical analysis. Experimental Psychology, 57(1-4). Retrieved from 10.1027/1618-3169/a000001

Gould, S. J. (1996). The mismeasure of man. New York: WW Norton & Company.

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.

Nuijten, M. B., Hartgerink, C. H., Assen, M. A. van, Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6521), 943. Retrieved from dx.doi.org/10.1126/science.aac4716

Rouder, J. N. (2016). The what, why, and how of born-open data. Behavioral Research Methods, 48, 1062–1069. Retrieved from 10.3758/s13428-015-0630-z

Rouder, J. N., Haaf, J. M., & Snyder, H. K. (2019). Minimizing mistakes in psychological science. Advances in Methods and Practices in Psychological Science.

Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016). Is there a free lunch in inference? Topics in Cognitive Science, 8, 520–547.

Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., R. B. Adams, J., … Zwaan, R. A. (2016). Registered replication report: Strack, martin, & stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. Retrieved from https://doi.org/10.1177/1745691616674458

Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 627–633. Retrieved from https://doi.org/10.1177/1745691612463078

Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS ONE, 6(11), e26828. Retrieved from http://www.plosone.org/annotation/listThread.action?root=19627